tfD-A14i  365 


UNCLASSIFIED 


APPROPRIATENESS  MEASUREMENT  WITH  POLVCHOTOMOUS  ITEM 
RESPONSE  MODELS  ANDSS.  .  <U>  ILLINOIS  UNIV  AT  URBANA 
MODEL  BASED  MEASUREMENT  LAB  F  DRASGOH  ET  AL.  APR  84 
MEASUREMENT  SER-84-1  N00014-79-C-0752  F/G  5710 


rprMpiei 


,s&$%5 


i||^S^';:/^p^prl#ttfit:»*  Measurement 
with  Polychotoaous  Item  Response  Models 
and  Standardized  Indices 


Fritz  Drasgow,  Michael  V.  Levine 
'%  -f*d  Esther  A.  Williams 


m  154-445 


immwmw 

*?$isiircn' :■ 


distribution  uni  ini  ted 


■as  Government 


pj  ■  iT?a 

7*; 

M 

REPORT  DOCUMENTATION  PAGE 


1  «C»C*T  taUtfICA 


Measurement  Series  84-1  _ 


«  Tir^C  r»«Mlll/<J 

Appropriateness  Measurement  with  Polychotomous 
Item  Response  Models  and  Standardized  Indices 


READ  INSTRUCTIONS 
•EFORE  COMPLETING  PORM 


>  REORIENT-*  catalog  NUN|()I 


I  Tvn  or  RERORT  ft  RftRiOO  COVERED 


Technical  Report 


•  rerrorminc  org  number 


Ag 


Fritz  Drasgow,  Michael  V.  Levine 
and  Esther  A.  Williams 


(  CONTRACT  OR  SUNT  NUNKAtJ 

N00014-79C-0752 

N00014-83K-0397 


•  PCR^ORMIWC  oao  ANI  2  ATlON  NAME  AnO  AOOAC5J 

Model  Based  Measurement  Laboratory 
University  of  Illinois 

Urbana,  IL  61820  _  _ 


'I  CONTROLLING  OFKlCI  NAME  AND  ADDRESS 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research  (code  442PT) 

Arlington,  VA  22217  _ 


«  MONITORING  agency  name  •  ADDRESS*-*?  rniltnml  hwm  Conirsflfn*  Oilier) 


to  RROCRAM  Element  RROJECT  TASK 
area  ft  RORK  unit  numIERS 

61153N  RR042-04 

NR  154-445 
NR  150-518 


It  RERORT  DATE 


is  number  or  rages 

60 


<s  security  class.  (*i  mi*  nr«i; 


ISa  OEClAMiRiCATION  DOVNGRAOinO 
SCHEDULE 


IS  Distribution  statement  r»i  mu,  Report) 


Approved  for  public  release:  distribution  unlimited. 

Reproduction  in  whole  or  in  part  is  permitted  for  any  purpose  of  the  United 
States  Government. 


17.  OiSTHituTiOW  STATEMENT  (o I  thm  mkotroct  mt oro4  In  Hack  20,  II  4tttoront  from  Koomrt) 


If  KEY  VOAOS  rconttm to  on  rowoooo  otto  if  rkymvt  m*  ttonttty  by  Hoc*  mm* m) 

Latent  trait  theory,  item  response  theory,  multiple  choice  test, 
appropriateness  measurement,  person  fit,  polychotomous  models, 
appropriateness  index,  wrong  answers. 


10.  AOSTA  ACT  (Contlmoo  on  roooroo  otto  It  nocoooory  m *  i  ton  t  tty  by  block  mmbot) 

.^The  test  scores  of  some  examinees  on  a  multiple-choice  test  may  not  provide 
satisfactory  measures  of  their  abilities.  The  goal  of  appropriateness  meas¬ 
urement  is  to  identify  such  individuals.  Earlier  theoretical  and  experimental 
work  considered  examinees  answering  all,  or  almost  all,  test  items.  This 
article  reports  research  that  extends  appropriateness  measurement  methods  to 
examinees  with  moderately  high  nonresponse  rates.  These  methods  treat  non¬ 
response  as  if  it  were  a  deliberate  option  choice  and  then  attempt  to  measure 
the  "appropriateness"  of  the  pattern  of  option  choices.  Earlier  studies  used 


oo  COITION  OR  I  NOV  SI  IS  OOSOWCTC 

C/N  0103  IS  01*4001  , 


SCCwRlTY  CL  AftftlRlCATlOR 


All  Rill  rOSMR  Dm,  OumO 


•ICu**  T»  CL  AJ||*'C  A*iO*«  O *  ▼«•!»  Dm*  (m»io 


-7only  the  dichotomous  pattern  of  right  and  "not  right"  answers.  A  general 
polychotomous  model  is  introduced  along  with  a  technique  called  "standard¬ 
ization"  designed  to  reduce  the  observed  confounding  between  measured  appro¬ 
priateness  and  ability.  A  standardized  appropriateness  index  based  on  a 
polychotomous  model  yielded  higher  rates  of  detection  of  simulated  spuriously 
low  examinees  than  the  analogous  index  based  on  a  dichotomous  model.  How¬ 
ever,  the  converse  was  true  for  simulated  spuriously  high  examinees.  Stan- 
ardization  was  found  to  reduce  greatly  the  interaction  between  ability  and 
measured  appropriatemess. 


■CCuftlTV  CLAUlFlCftTiON  or  THIS  Dm*  fell *r*4> 


Appropriateness  Measurement  with  Polychotomous 
Item  Response  Models  and  Standardized  Indices 
Fritz  Drasgow,  Michael  V.  Levine  and  Esther  A.  Williams 
University  of  Illinois 


Running  head:  Appropriateness  Measurement 
Address  correspondence  to:  Fritz  Drasgow 

Department  of  Psychology 
University  of  Illinois 
603  E.  Daniel  Street 
Champaign,  IL  61820 
USA 


Appropriateness  Measurement 
1 

Abstract 

The  test  scores  of  some  examinees  on  a  multiple-choice  test  may  not  provide 
satisfactory  measures  of  their  abilities.  The  goal  of  appropriateness 
measurement  is  to  identify  such  individuals.  Earlier  theoretical  and  exper¬ 
imental  work  considered  examinees  answering  all,  or  almost  all,  test  items. 

This  article  reports  research  that  extends  appropriateness  measurement  methods 
to  examinees  with  moderately  high  nonresponse  rates.  These  methods  treat 
nonresponse  as  if  it  were  a  deliberate  option  choice  and  then  attempt  to 
measure  the  "appropriateness"  of  the  pattern  of  option  choices.  Earlier 
studies  used  only  the  dichotomous  pattern  of  right  and  "not  right"  answers. 

A  general  polychotomous  model  is  introduced  along  with  a  technique  called 
"standardization"  designed  to  reduce  the  observed  confounding  between  measured 
appropriateness  and  ability.  A  standardized  appropriateness  index  based  on  a 
polychotomous  model  yielded  higher  rates  of  detection  of  simulated  spuriously 
low  examinees  than  the  analogous  index  based  on  a  dichotomous  model.  However, 
the  converse  was  true  for  simulated  spuriously  high  examinees.  Standardization 
was  found  to  reduce  greatly  the  interaction  between  ability  and  measured 
appropriateness. 


Appropriateness  Measurement 
2 


Appropriateness  Measurement  with  Polychotomous 
Item  Response  Models  and  Standardized  Indices 
Fritz  Drasgow,  Michael  V.  Levine  and  Esther  A.  Williams 

1.  Introduction 

An  examinee's  score  on  a  standardized  multiple  choice  test  may  fail 
to  provide  a  useful  measure  of  ability  for  various  reasons.  The  score  may 
be  too  high  because  the  examinee  began  the  test  with  memorized  answers  to 
several  questions  or  because  the  examinee  copied  answers  to  several  questions 
from  a  much  brighter  examinee.  The  score  may  be  too  low  because  the  examinee 

(a)  made  an  alignment  error  over  a  block  of  items,  answering,  say,  the  eleventh 
item  in  the  tenth  space,  the  twelfth  item  in  the  eleventh  space,  .  .  .; 

(b)  interpreted  several  very  easy  items  in  creative  ways  and  came  to  well- 
reasoned,  albeit  scored-as- incorrect,  answers;  (c)  was  tested  in  an  unfamiliar 
language;  (d)  failed  to  answer  items  on  which  he/she  was  able  to  eliminate 
several  incorrect  options;  or,  (e)  worked  with  extreme  care  and  consequently 

w 

never  reached  easy  items  on  a  power  test. 

In  all  of  these  examples,  the  examinee  often  produces  an  unusual 
pattern  of  answers  with  relatively  many  easy  items  answered  incorrectly  and 
hard  items  answered  correctly.  Appropriateness  measurement  (Levine  and  Rubin, 
1979;  Levine  and  Drasgow,  1982,1983a;  Drasgow,  1982;  Hulin,  Drasgow  and 
Parsons,  1983,  Chapter  4)  is  a  model-based  attempt  to  control  test  pathologies 
by  recognizing  unusual  patterns.  A  model  is  fit  to  the  item  response  patterns 
of  a  large  sample  of  presumably  normal  examinees.  Subsequently,  individual 
examinees  and  their  response  patterns  can  be  ordered  according  to  how  well 
they  are  fit  by  the  group  model. 


Appropriateness  Measurement 
3 


Earlier  appropriateness  measurement  work  was  based  on  models  for 
dichotomous  data  and  therefore  was  limited  in  two  important  ways.  The 
pattern  of  nonresponse,  which  may  have  high  diagnostic  value,  was  ignored. 

In  fact,  earlier  studies  were  forced  to  exclude  examinees  with  high  rates 
of  nonresponse  or  introduce  ad  hoc  corrections  for  omitting.  Secondly,  the 
earlier  studies  failed  to  take  cognizance  of  which  wrong  option  was  chosen 
and  therefore  probably  were  not  as  sensitive  to  some  irregularities  as  they 
might  have  been. 

The  work  reported  in  this  paper  is  intended  to  advance  appropriateness 
measurement  in  two  ways.  A  method  is  introduced  for  extending  appropriateness 
measurement  to  samples  of  examinees  with  moderately  high  rates  of  nonresponse. 
Simultaneously  methods  sensitive  to  option  choice  are  introduced.  It  will 
be  seen  that  in  pursuing  these  goals,  progress  has  been  made  in  comparing  the 
appropriateness  of  scores  at  different  ability  levels. 


$ 

3 


Appropriateness  Measurement 
4 


The  goal  of  appropriateness  measurement  is  to  identify  examinees  with 
inappropriate  scores  solely  from  their  response  patterns.  This  is  done  in 
two  steps.  First,  a  general  psychometric  model  is  fit  to  a  large  sample  of 
nominally  normal  examinees.  Then  an  index  of  goodness  of  fit  or  appropriate¬ 
ness  index  is  used  to  measure  the  degree  to  which  each  individual  examinee's 
response  pattern  fits  the  model  used  to  characterize  normal  behavior. 

In  the  first  large  scale,  systematic  appropriateness  measurement  study, 
Levine  and  Rubin  (1979)  showed  that  under  ideal  conditions  certain  test 
anomalies  were  detectable.  They  modified  simulated  item  response  data  to 
create  answer  sheets  with  spuriously  high  and  spuriously  low  scores.  Three 
types  of  appropriateness  indices  were  found  to  classify  normal  and  moderately 
aberrant  examinees  rather  well.  However,  their  study  was  limited  to  simulated 
data  conforming  the  "three  parameter  logistic  model"  (Bimbaum,  1968).  Further¬ 
more,  their  use  of  simulation  item  parameters  (rather  than  estimated  item 
parameters)  left  open  the  question  of  how  well  appropriateness  measurement 
would  perform  in  applications  requiring  parameter  estimation. 

Levine  and  Drasgow  (1982,1983a)  extended  the  basic  results  to  more 
realistic  conditions.  They  used  actual  and  simlated  data  to  study  system¬ 
atically  the  effects  of  the  unavoidable  inclusion  of  aberrant  examinees  in 
samples  of  nominally  normal  examinees  and  of  errors  in  estimating  item  para¬ 
meters.  They  found  good  aberrance  detection  with  actual  and  simulated  data 
despite  model  misspecification  and  parameter  estimation  error.  However, 
like  Levine  and  Rubin  they  only  considered  examinees  who  answered  all  or 
nearly  all  items  and  they  ignored  which  wrong  answer  was  chosen  when  a  wrong 
answer  was  chosen. 


Appropriateness  Measurement 
5 


The  research  reported  in  this  paper  extends  earlier  appropriateness 
measurement  studies  to  (1)  examinees  with  substantial  nonresponse  rates 
by  using  (2)  polychotomous  models  and  (3)  standardized  indices.  Non¬ 


response  to  an  A-option  multiple  choice  item  is  coded  as  the  "choice"  of  an 
A+l—  option.  In  this  way,  every  item  is,  in  a  technical  sense,  answered. 

A  polychotomous  psychometric  model  is  developed  to  quantify  the  probability 
of  each  option  choice  (including  the  A+l—)  at  each  ability  level.  Aberrance 
is  measured  by  the  goodness  of  fit  of  the  polychotomous  model. 

Concern  for  omitting  has  focused  our  attention  on  conditional  distri¬ 
butions  of  indices  and  the  problems  of  comparing  appropriateness  index  values 
at  different  ability  levels.  Examinees  omitting  different  items  are,  in 
effect,  taking  different  tests.  A  low  appropriateness  index  value  in  the 
presence  of  substantial  omitting  may  be  less  indicative  of  aberrance  than 
a  higher  index  value  for  another  examinee  with  a  different  nonresponse  pattern. 
Thus  we  have  attempted  to  introduce  a  "common  metric"  for  indices.  Our 
strategy  has  been  to  divide  examinees  into  relatively  homogeneous  groups  by 
using  a  gross  feature  of  the  response  pattern,  to  approximate  the  distribution 
of  the  index  values  within  each  group,  and  to  use  the  approximated  conditional 
distributions  to  define  a  transformation  of  indices  to  a  common  distribution. 

The  "gross  feature"  is  the  maximum  likelihood  ability  estimate,  which  we  expected 
to  reflect  omitting  rates.  The  common  distribution  was  the  standard  normal. 


This  process,  which  we  call 


:ion.  has  been  useful  in  controlling  the 


confounding  of  ability  and  appropriateness. 


-  ■  -T-  «  i  «•  *1  I.  •  v'  o.  ^  O  V  •  •  \  •  s.  «U  ■v  •  •  O 


Appropriateness  Measurement 
6 


3.  Option  Response  Functions  and  a  Constant  Ability,  Polychotomous  Model 

As  a  descriptive  model  for  normal  test  taking  behavior  we  have  used  the 
most  general  unidimensional,  locally  independent  constant  ability  model  that 
generalizes  the  three-parameter  logistic  model.  It  can  be  shown  that  any 
unidimensional  model  with  three  parameter  logistic  item  characteristic 
curves  and  conditionally  independent  item  responses  is  a  special  case  of 
this  model,  which  we  call  the  histogram  model. 

To  express  the  assumptions  of  the  histogram  model  let 
V  =  <  > 

denote  the  random  vector  of  option  choices  and 


<v1.v2..-~.vn  > 

be  a  vector  of  constants  indicating  option  choices.  It  is  assumed  that  for  a 
uni  dimensional  ability  random  variable  6 


Prob{V^=v^  &  ^2~-2  *  ' 


&  V  =v 
-n  -n 


I  e=t) 


n 

=  n  Prob{V.=v. je=t} 
i=l  1 


(1) 


for  all  ^  and  real  t^  .  Furthermore,  it  is  assumed  that  if  v*  is  the 
correct  option  choice  for  the  i_—  item,  then  for  some  a^  ,b_^  ,c .  and  for 
all  t 

Prob{V.=v*|8=_t}  =  £i+(l-£i  )(1  +  expf-a^t-b.  )]}-1  . 


(2) 


Appropriateness  Measurement 
7 

This  model  is  tentatively  introduced,  not  as  a  plausible  model  for  test 
taking  behavior,  but  as  an  admittedly  crude  descriptive  model  for  test  data  that 
may  or  may  not  adequately  support  the  extension  of  appropriateness  measurement 
techniques  to  polychotomous  data  with  high  omitting  rates.  The  functions 

P.  .(t)  =  Prob {option  j_  is  chosen  on  item  i_j9=t)  ,  i=l,...,A+l  (3) 

*  sJ 

generalize  the  item  response  function  of  item  response  theory  and  are  cal 
option  response  functions.  Their  estimation  is  discussed  in  the  followi 
section. 

The  likelihood  of  a  response  pattern  can  be  easily  expressed  in  terrib. 
of  the  option  response  functions.  According  to  this  model,  the  probability 
of  sampling  an  examinee  with  response  pattern  V  =  v  from  the  subpopulation 
of  all  examinees  with  ability  6  =  t  is 

P(V=v|e=t)  = 

%  S\j 

n  A+l 

n  £  *  (*) 

i=l  j=l  3  3 

where  the  first  A+l  positive  integers  are  used  as  scores  for  option  choices 
and  =  1  if  k.  =  and  zero  otherwise. 

J 

This  equation  has  been  used  to  compute  polychotomous  maximum  likelihood 
ability  estimates.  The  dichotomous  model  ability  estimates  are  obtained 
by  maximizing  the  dichotomous  model  likelihood  function 

S  C&Mt)  +  (l-ujCUt)]  (5) 

i=l  11  11 

where  u-  is  one  or  zero  according  to  whether  is  the  correct  option, 

P^(-)  is  the  option  response  function  of  the  correct  option  given  in  equation 


Appropriateness  Measurement 
8 


Various  techniques  have  been  proposed  for  estimating  option  response 
functions.  Bock  (1972)  selects  a  parametric  form  for  the  functions  and 
computes  maximum  likelihood  estimates.  The  results  of  Lord  (1969,1970), 
Samejima  (1981),  and  Levine  (1982)  on  ability  density  estimation  are  rel¬ 
evant  since  option  characteristic  curves  can  be  represented  as  ratios  of 
ability  density  estimates.  In  particular,  the  probability  of  sampling  an 
examinee  choosing  the  option  from  the  subpopulation  of  examinees  with 

ability  t_  can  be  written  as 


Vi>  -  Vij'i)  s  i<i> 


where  P . .  is  the  proportion  of  examinees  choosing  option  for  item  i_  , 

*  J 

f(jt)  is  the  probability  density  of  0  and  f..(t)  is  the  0  density  in 

•  J 

the  subpopulation  selecting  option  £  for  item  i_  .  Thus,  if  ability 
distributions  can  be  estimated  (from  the  dichotomously  scored  data),  then 
option  response  curves  can  be  estimated  with  no  further  specification  of 
their  form. 

An  option  response  function  is  simply  the  regression  item  option  score 
on  ability,  i .e. 

Pij(t)  -  E(6J  (v.)|e«t}  (6) 

where,  as  before,  5. (JO  =  1  if  i  =  k  and  zero  otherwise. 

J 

We  have  taken  the  simple  expedient  of  using  large  sample  estimates  of 

A 

the  regression  of  option  score  on  estimated  ability  8 
E { 6 j (V. ) | 9=t) 

as  £jj(t)  estimates  in  this  initial  study.  To  obtain  these  estimates, 


Appropriateness  Measurement 
9 


maximum  likelihood  ability  estimates  ( 6^ * s )  were  computed  for  a  large 
sample  of  examinees  from  dichotomously  scored  data.  The  examinees  were 
grouped  according  to  their  6^'s  .  The  proportion  choosing  each  option 
for  each  ability  group  was  used  as  an  estimate  of  a  point  on  an  option 
response  function.  Linear  interpolation  was  used  between  estimated  points. 
Numerical  details  are  given  in  Levine  and  Drasgow  (1983b). 

These  crude  estimates  of  option  response  functions  are  not  consistent 
and  will  lead  to  systematic  errors  in  ability  estimates.  Nonetheless,  they 
permit  us  to  begin  an  evaluation  of  appropriateness  measurement  strategies 
without  first  undertaking  a  major  parameter  estimation  task. 


Appropriateness  Measurement 
10 


5.  The  Indices  and  their  Standardizations 

In  this  report  we  are  exclusively  concerned  with  generalizations  of 
the  linear  function  of  item  scores 


t0  u.  logP.(6d)  +  (1-u.)  loga.(ed)  .  (7) 

Here  u^  is  the  dichotomous  item  score  which  is  one  if  item  is  answered 

/V 

correctly  and  zero  if  it  is  answered  incorrectly  and  6d  maximizes  the  dichot¬ 
omous  model  likelihood  function.  iQ  has  the  advantage  of  being  fairly  easy 
to  compute.  In  comparative  studies  it  was  found  to  perform  roughly  as  well 
as  more  elaborate  indices  (Drasgow,  1982;  Levine  &  Rubin,  1979). 

Z0  is  the  maximum  of  the  logarithm  of  the  dichotomous  model  likelihood 
function.  The  obvious  generalization  of  lQ  to  the  histogram  model  is  the 
maximum  for  the  polychotomous  model  log  likelihood  function 

n  A+l 

max  E  Z  6 . (_v. )  log  P.  -(e)  .  (8) 

6  i=i  j=]  J  1 


In  as  much  as  our  histogram  model  likelihood  function  does  not  have  a  continuous 
first  derivative  and  was  complicated  to  maximize,  the  generalization 


z 


o,h 


n  A+l 

E  Z  6.(v.)  logP..(e.) 
i=l  j=l  3  1  1J  a 


(9) 


was  used.  is  the  logarithm  of  the  histogram  model  likelihood  function 

A 

evaluated  at  the  dichotomous  model  maximum  likelihood  ability  estimate  ©d  . 


Appropriateness  Measurement 
11 


As  discussed  in  detail  in  Section  7  below,  the  distribution  of 
was  found  to  depend  on  ability.  Therefore,  two  new  indices  were  defined: 


z3  =  [ft0  -  E3(©d) ]  t  o3(8d)  (10) 

and 

=  t£o,h  "  Eh^®d^  4  ah^®d^  ' 

In  these  formulas  E3>  Eh>  a3  and  ah  are  conditional  means  and  standard 
deviations  for  the  three-parameter  logistic  and  histogram  model.  E3(t)  is 
the  conditional  expected  value  of  the  random  variable  )L(t) 

n 

X-,(t)  =  Z  U.  log  P_- (t)  +  (1-U.)  log£.(t)  (12) 

J  i=l  1  1  1 

computed  using  the  three-parameter  logistic  model.  Thus, 


:3(t)  =  E{X3(t)|e=t>  =  Z^P.fDlogP.ft)  +  ^(t)  logfi.  (t)  .  (13) 


a_(t)  is  the  square  root  of  the  conditional  variance 


73(t)  =  Var{X3(t)Ie=t}  =  I  P.(t)0,(t)[  log  (P.(t)  / ^ ( t) ) ]  .  (la) 


Si  mi  larly 


n  A+l  n  A+l 

(t)  =  E{  I  Z  fi.(V.)  logP,.(t)|9=t}  »  I  2  Pii(t)logPii(t)  (15) 
h  “  i=l  j=l  J  1  i=l  j=l  J  J 


and 


n  A+l 


o.(t)  =  Var{  l  l  6.(V. )  log  P- -(t)je=t} 
i=l  j=l  J  J 


=  Z[E  E  Pij(t)Pik(t)  log  Pi:j(t)  log  (Pia-(t)  /  Pik(t))3  .  (16) 


Appropriateness  Measurement 
12 


These  transformations  were  found  to  reduce  greatly  the  dependence  of  l0 
and  l .  .  on  ability.  Their  rationale  is  discussed  in  Section  7  below. 


Appropriateness  Measurement 
13 


6.  Data  and  Parameter  Estimation 

Responses  of  approximately  75,000  examinees  to  the  April,  1975 
Scholastic  Aptitude  Test- Verbal  section  (SAT -V)  were  obtained  from  the 
College  Entrance  Examination  Board.  A  spaced  sample  of  3,000  response 
vectors  was  formed  by  selecting  the  responses  of  every  twentieth  examinee, 
beginning  with  the  first  examinee.  Item  responses  were  then  scored  as 
correct,  incorrect,  omitted,  or  not- reached  and  the  L0GIST  computer  pro¬ 
gram  (Wood  &  Lord,  1976;  Wood,  Wingersky  &  Lord,  1976)  was  used  to  estimate 
item  and  person  parameters  of  the  three-parameter  logistic  model.  Version 
2.B  of  L0GIST  and  its  default  options  were  used.  Convergence  was  obtained 
before  the  maximum  number  of  iterations  was  reached. 

These  item  parameter  estimates  were  then  used  to  construct  histograms 
summarizing  the  pattern  of  option  selection  at  various  ability  levels  for 
the  49,470  examinees  following  the  first  25,000  examinees  in  the  data  set. 

The  ability  estimate  (0^)  was  computed  for  each  of  these  examinees  by  the  method 
of  maximum  likelihood  for  the  dichotomously  scored  data.  The  item  parameters 
estimated  by  L0GIST  were  used  in  these  calculations.  Omitted  and  not- reached  items 
were  ignored  in  the  calculations;  Lord's  (1974)  modified  likelihood  function 
was  not  used.  Examinees  were  sorted  into  25  ability  strata  based  on  estimated 
ability.  The  fourth,  eighth,  twelfth,  .  .  .  ,  96th  percentile  points  from 
the  standard  normal  distribution  were  used  as  cutting  scores  to  form  the 
ability  categories.  The  frequencies  of  option  selection  were  determined 
for  each  of  the  85  SAT-V  items  in  each  ability  category.  Finally,  these 
frequencies  were  converted  to  proportions.  The  proportion  choosing  option 
j.  of  item  i_  for,  say,  the  lowest  ability  group  was  taken  as  the  estimate 
of  P.  .(0)  for  0  =  -2.054,  the  second  percentile  of  the  standard  normal 

1  J 

distribution. 


Appropriateness  Measurement 
14 


Proportions  choosing  option  of  item  i_  for  the  other  24  ability  groups 
were  taken  as  estimates  of  the  values  of  P. .  at  the  6th,  10th,  ....  98th 

1  J 

percentile  points  of  the  standard  normal  distribution.  Linear  interpolation 
between  estimated  values  of  P. .  was  used  when  an  ability  estimate  was 

*  J 

between  percentile  points.  No  estimate  of  P^.(8)  was  defined  outside  the 
interval  [-2.054,2.054]. 

We  chose  to  use  25  ability  groups  because  this  number  appeared  to  be  the 
best  compromise  between:  (1)  the  desire  to  reduce  sampling  fluctuations  by 
including  a  large  number  of  examinees  in  each  ability  category;  and  (2)  the 
desire  to  reduce  bias  by  averaging  over  a  short  range  of  abilities.  Graphs 
of  estimated  curves  and  further  details  on  the  estimation  procedure  are  in 
Levine  and  Drasgow  (1983b). 


7.1.  Sampl es  with  Unrestricted  Omittin 


To  examine  the  distributions  of  the  various  appropriateness  indices, 
three-parameter  logistic  maximum  likelihood  estimates  of  ability  were 
computed  for  the  first  500  response  vectors  in  the  data  set.  A  total  of  464 
examinees  had  ability  estimates  in  the  interval  -2.054  £6  £  ♦2.054. 

Then  l0  for  the  three  parameter  logistic  and  histogram  models,  z^  ,  and 
z^  were  computed  for  this  sample  of  464  nominally  normal  examinees. 

A  scatterplot  of  the  three-parameter  logistic  t0  index  and  e  for 
the  first  150  examinees  is  presented  in  Figure  1.  The  darkened  circles  plotted 
in  Figure  1  are  conditional  means  of  for  the  subset  of  the  464  examinees 
with  6^  e  [t-.3,t+.3]  ,  t  *  -2.0,  -1.8,...,  +2.0.  The  dependence  of  Iq 
on  estimated  ability  is  apparent  in  this  figure.  The  mean  Iq  for  examinees 
with  Q d  less  than  -1.62  is  approximately  -36.  For  examinees  with  near 
-1.0,  the  mean  l0  is  about  -42.  Mean  t0  rises  as  ability  increases, 
until  it  reaches  roughly  -30  for  examinees  with  6  >  +1.64.  Thus  an  £q 
score  of  -42  is  quite  low  in  one  group  of  normal  examinees  and  average  in 
a  less  able  group  of  normal  examinees. 


•^A'AWVV V f  *  v.*  ’ 1 ' 1 '  ^  ^ 


Appropriateness  Measurement 
16 


Figure  1  shows  that  some  adjustment  of  the  three-parameter  logistic 

/V 

t0  index  is  necessary:  the  regression  of  _t0  on  is  not  a  horizontal 
line.  Because  the  conditional  distribution  of  £q  given  0^  varies  as  a 
function  of  8  ,  it  is  difficult  to  interpret  the  magnitude  of  lQ  directly 
The  histogram  £0^  index  is  plotted  against  9d  in  Figure  2  for  the 
first  150  examinees.  The  darkened  circles  are  conditional  means  computed  in 
the  same  way  as  in  Figure  1.  The  dependence  of  histogram  JL.  is  even  more 
apparent  in  this  figure. 


Insert  Figure  2  about  here 


/\ 

If  0d  were  equal  to  9  ,  the  local  independence  assumption  could  be 

used  to  reduce  the  dependence  of  l0  on  ability.  According  to  the  local 

independence  assumption,  in  the  subpopulation  of  examinees  with  ability 

9  =  t  the  item  scores  a-  are  independent.  Therefore  the  sum  X^(t)  given 

in  equation  (12)  is  approximately  normal  with  mean  and  variance  £3(1)  and 
2 

03 ( t_)  given  in  formulas  (13)  and  (14).  Therefore,  z^  in  equation  (10)  would 
be  approximately  normal  (0,1)  for  both  low  and  high  ability  examinees.  This 
was  the  rationale  supporting  the  indices  z^  and  z^  .  The  final  expressions 
in  equations  (13)  through  (16)  provide  approximations  to  the  actual  moments 

A 

when  parameter  estimates  are  substituted  for  parameters.  Of  course  0d  is 


Appropriateness  Measurement 
17 


not  equal  to  6  ,  so  the  standardization  is,  at  best,  approximate  and  the  above 
argument  is  merely  heuristic.  Nonetheless,  it  will  be  shown  shortly  the  trans¬ 
formed  z_  indices  are  much  less  sensitive  to  distribution  of  ability  than  the 
distributions  of  the  untransformed  2,  indices. 

Figure  3  presents  the  scatterplot  of  z^  and  9^  for  the  first  250 
examinees  and  conditional  means  obtained  from  the  entire  sample.  For  most 

A 

values  of  6.  the  conditional  means  are,  as  desired,  close  to  the  line 

G 

0  .  For  examinees  with  extreme  values  of  0^  ,  however,  the  conditional 
means  are  slightly  less  than  zero. 


Insert  Figure  3  about  here 


Appropriateness  Measurement 
18 


The  scatterplot  of  z^  and  9^  is  presented  in  Figure  4  for  the  first 
250  examinees.  Again,  there  is  little  relation  between  the  standardized 

A 

index  and  0^  .  The  most  striking  feature  of  this  plot  is  the  abnormally 
large  number  of  examinees  with  very  small  values  of  .  In  the  entire 

sample  of  464  examinees,  there  were  20  examinees  with  index  scores  less  than 
-2.40;  the  expected  number  of  scores  in  this  range  for  a  standard  normal 
variable  is  only  3.8. 


Insert  Figure  4  about  here 


To  determine  the  cause  of  the  unusually  frequent  small  index  scores, 
response  vectors  of  examinees  with  _zh  less  than  -2.4  were  inspected. 
Interestingly,  many  of  these  response  vectors  had  very  large  numbers  of 
omitted  items.  Of  the  20  examinees  with  the  smallest  values  of  z^  ,  11 
examinees  (55  percent)  omitted  30  or  more  items.  In  contrast,  only  16  of 
the  444  examinees  with  greater  than  -2.4  omitted  30  or  more  items,  or 
3.6  percent.  Further  inspection  of  the  27  examinees  with  30  or  more  omits 
showed  that  their  mean  §,  was  -.14  and  their  mean  z.  was  -1.87.  Thus, 
this  group  appears  quite  ordinary  with  respect  to  ability,  but  has  very 
atypical  response  patterns  in  that  they  omit  more  than  35  percent  of  the  test. 

A  second  group  of  nominally  normal  examinees  also  had  very  low  index 
scores.  There  were  seven  examinees  (in  the  sample  of  464)  who  reached  less 
than  77  percent  of  the  items  on  the  test.  Their  average  z^  index  score 


Appropriateness  Measurement 
19 


To  further  investigate  the  relation  between  high  omitting  and  _zh  , 
the  response  vectors  of  examinees  501  through  1,000  were  analyzed.  Note 
that  this  sample  serves  to  replicate  findings  from  the  first  sample. 

A  total  of  456  of  these  examinees  had  estimated  abilities  in  the  range  from 
-2.05  to  2.05,  23  omitted  35  percent  or  more  of  the  test  items,  6  reached 
less  than  77  percent  of  the  test  items  and  16  had  values  of  less  than 
-2.4. 

The  relations  between  omitting  and  z h  that  were  noted  for  the  first 
sample  of  examinees  were  confirmed  in  this  second  sample.  In  particular 
the  mean  _z^  for  examinees  who  omitted  more  than  35  percent  of  the  test  was 
-1.85.  Seven  of  these  examinees  had  _zh  values  of  less  than  -2.4.  Even 
stronger  results  occurred  for  the  six  examinees  who  reached  less  than  77 
percent  of  the  test  items:  all  six  had  scores  of  less  than  -2.4.  The 
mean  _zh  for  this  group  was  -3.08. 

A  total  of  430  examinees  in  the  second  SAT-V  sample  omitted  less  than 
35  percent  of  the  test  and  reached  77  percent  or  more  of  the  test  items. 

In  this  group,  6  examinees  had  z^  scores  of  less  than  -2.4.  The  expected 
number  for  a  standard  normal  population  is  3.5.  In  contrast,  of  the  26 
examinees  who  omitted  more  than  35  percent  of  the  test,  or  reached  less 
than  77  percent  of  the  test  items,  10  had  z^  values  of  less  than  -2.4; 
the  expected  number  is  .2. 

It  is  not  surprising  that  high  omitting  rates  and  not  finishing  the 
exam  cause  _zh  to  indicate  aberrance.  Perhaps  the  most  important  point 
to  note  about  the  relation  between  high  omitting  and  _zh  is  that  it  is 
not  high  omitting  per  se  that  causes  very  extreme  _zh  values.  Instead, 
it  is  the  too  frequent  omitting  of  easy  items,  or  items  with  very  effective 


Appropriateness  Measurement 
20 


distractors.  For  example,  examinees  who  reached  less  than  77  percent  of 
the  test  items  did  not  attempt  several  easy  items;  across  the  last  10 

a 

items  on  each  of  the  two  SAT-V  subsections,  there  were  5  items  with 
values  less  than  -1.0,  and  9  items  with  b^.  values  less  than  0.0. 

Because  examinees  who  omit  more  than  35  percent  of  the  test  or  reach 
less  than  77  percent  of  test  items  appear  to  receive  spuriously  low  test 
scores,  they  are  excluded  from  all  subsequent  analyses. 

7.2.  Restricted  Omitting  Sample 

To  investigate  further  the  distributions  of  z^  and  z 3  ,  a  large 
sample  of  nominally  normal  examinees  was  formed.  First,  three  parameter 
logistic  maximum  likelihood  estimates  of  ability  were  computed  for 
examinees  10,001  to  14,000  on  the  SAT-V  tape.  Then,  examinees  who  met 
the  following  three  criteria  were  included  in  the  sample: 

1.  Less  than  35  percent  of  the  test  items  were  omitted; 

2.  77  percent  or  more  of  the  test  items  were  reached; 

3.  Estimated  ability  was  in  the  range  -2.05  <  0  <  2.05. 

The  z^  and  z^  appropriateness  indices  were  computed  for  the  3478  examinees 
who  satisfied  these  criteria. 

Figure  5  presents  the  cumulative  frequency  distributions  for  z^  and 
in  the  sample  of  3478  nominally  normal  examinees.  The  cumulative 
distribution  function  of  the  standard  normal  distribution  is  also  presented. 
From  Figure  5,  it  is  apparent  that  the  distributions  of  z^  and  z^  are 
slightly  asymmetric:  there  are  relatively  few  examinees  with  index  scores 
between  -2.0  and  0.0,  and  relatively  many  with  index  scores  between  0.0  and 

1.2.  Both  empirical  distributions  are  significantly  different  form  the 
standard  normal  distribution  (a  =  .01)  by  the  Kolmogorov-Smirnov  test. 


Appropriateness  Measurement 
21 


Insert  Figure  5  about  here 

For  the  purposes  of  appropriateness  measurement,  it  is  not  essential 
that  z^  and  z,  follow  standard  normal  distributions.  It  is^  important 

A 

that  each  index  be  distributed  similarly  across  values  of  0d  .  Table  1 
presents  information  concerning  the  left  tails  of  the  distributions  of  z^ 
and  z^  within  five  mutually  exclusive  ability  intervals.  The  left  tails 
of  the  conditional  distributions  of  z ^  are  relatively  similar  across  the 
five  ability  intervals.  The  largest  difference  between  cumulative  proportions 
at  any  cutting  score  is  only  .03.  The  left  tails  of  the  conditional  cumulative 
proportions  of  z^  exhibit  less  invariance;  here  the  largest  difference  is  .054. 


Insert  Table  1  about  here 

The  relatively  large  differences  in  conditional  distributions  of  z^ 
for  different  ability  levels  may  result  from  the  presence  of  truly  aberrant 
response  patterns  in  the  sample  of  nominally  normal  response  vectors  rather 
than  inaccuracies  in  the  standardization  approximation.  (It  will  be  shown 
in  Section  8  that  z^  and  ^  are  TOre  sensitive  to  some  types  of  aberrance 
for  examinees  of  very  high  or  very  low  ability.)  To  investigate  this  possi¬ 
bility,  the  research  described  in  the  present  subsection  was  replicated 
using  simulation  data. 


and  Histogram  Model 

Samples  of  4,000  simulated  examinees  were  generated  using  the  three 
parameter  logistic  model  and  histogram  model.  Hypothetical  probabilities  of 
correct  responses  on  dichotomously  scored  items  and  hypothetical  probabilities 
of  option  selection  on  polychotomously  scored  items  were  computed  using  the 
three  parameter  logistic  model  ICC  estimates  and  histogram  option  response 
function  estimates  described  in  Section  6.  For  each  simulated  examinee,  an 
ability  was  sampled  from  the  standard  normal  distribution  truncated  to  the 
interval  [-2.05,2.05]  .  Responses  to  85  items  were  then  simulated  as  85 
independent  multinomials  with  response  probabilities  obtained  by  substituting 
sampled  ability  in  the  three  parameter  logistic  ICCs  and  histogram  option 
response  functions. 

Ability  was  estimated  for  each  response  vector  by  the  methods  described  in 
Sections  7.1  and  7.2.  Response  vectors  for  which  [§d|  >  2.05  were  discarded 
so  that  the  results  described  in  this  section  would  be  comparable  to  the  results 
presented  in  Section  7.2.  The  and  z^  indices  were  computed  using  est¬ 
imated  ability.  Simulated  examinees  were  then  sorted  into  five  ability  cat- 
egories  on  the  basis  of  6  (not  0d  )  . 

The  cumulative  proportions  of  appropriateness  index  scores  for  the  five 
ability  intervals  are  shown  in  Table  2.  Note  that  (1)  there  was  no  model 
mi sspecifi cation  or  parameter  estimation  problem  here  because  the  item  parameters 
and  option  response  probabilities  used  to  compute  index  values  were  identical 
to  those  used  to  generate  response  vectors;  and  (2)  there  were  no  truly 
aberrant  response  vectors  present.  Although  the  cumulative  proportions  in 
Table  2  tend  to  be  somewhat  smaller  than  the  corresponding  proportions  in 


Appropriateness  Measurement 
23 


Table  1,  the  overall  pattern  is  similar.  Again,  the  largest  difference  in 
conditional  proportions  for  the  three  parameter  logistic  is  .03.  The  largest 
difference  for  the  z^  proportions  is  .046,  which  again  suggests  that  there  is 
less  invariance  of  the  conditional  z^  distribution  than  for  the  z^  distribution 


Insert  Table  2  and  Table  3  about  here 


Shown  in  Table  3  are  the  conditional  proportions  of  index  values  obtained 
when  simulated  examinees  are  sorted  into  ability  categories  on  the  basis  of 
§d  rather  than  6  .  The  cumulative  proportions  for  z_3  are  similar  to  the 
proportions  shown  in  Table  2.  Curiously,  z^  shows  more  invariance  across 
ability  categories  in  Table  3  than  in  Table  2. 

7.4.  Summary 

The  standardized  lQ  indices,  z^  and  z^  ,  have  empirical  distributions 
that  are  reasonably  close  to  the  standard  normal  distribution.  The  Kolmogorov- 
Smimov  tests  indicate  that  z^  and  do  n°t  exactly  follow  the  standard 
normal  distribution.  Furthermore,  Tables  1  and  2  indicate  that  the  distributions 
of  z^  and  are  not  completely  independent  of  estimated  ability.  However, 
Figure  5  and  Tables  1,  2,  and  3  suggest  that  these  effects  are  fairly  small. 

In  addition,  these  tables  show  that  high  rates  of  detection  of  aberrant  response 
vectors  will  not  result  solely  from  differences  in  ability  distributions.  For 
this  reason,  z.  and  z,  are  used  as  appropriateness  indices  in  the  next  section. 


Appropriateness  Measurement 
24 


8.  Appropriateness  Measurement  with  Standardized  Indices 

8.1.  Overview 

In  this  section,  we  compare  the  distributions  of  the  two  appropriate¬ 
ness  indices  in  samples  of  normal  examinees  to  the  distributions  in  samples 
of  examinees  whose  response  vectors  have  been  modified  to  simulate  spuriously 
high  and  spuriously  low  examinees.  The  power  of  an  appropriateness  index 
is  indicated  by  the  extent  to  which  the  index  separates  the  normal  and 
aberrant  groups. 

8.2.  Normal  and  Aberrant  Groups 

The  normal  group  consists  of  the  3,478  nominally  normal,  low  omitting 
examinees  with  -2.05  <_  9  <  2.05  previously  described. 

The  aberrant  groups  were  formed  by  the  following  process.  First, 
only  examinees  with  less  than  35  percent  of  the  test  omitted  and  77  percent 
or  more  test  items  reached  were  considered.  Then,  starting  with  examinee 
1,001  on  the  SAT-V  tape,  300  examinees  with  estimated  ability  in  each  of  the 
five  ability  categories  were  selected  from  the  next  2,000  records.  The  §d 
categories,  termed  quintiles,  are: 


Quintile 

0  range 

Q1 

[-2.05 

,  -.80]; 

Q2 

(-.80, 

-.24]; 

Q3 

* 

CM 

1 

.24]; 

Q4 

(.24, 

.80]; 

Q5 

(.80,  ; 

2.05]. 

These  quintiles  of  response  vectors  were  then 

subjected  to 

various 

of  tampering  to  simulate  aberrance. 


A  \  \  \  V 


V.  •*-  V 


Appropriateness  Measurement 
25 


The  k%  spuriously  high  modification  consisted  of  randomly  selecting 
k%  of  the  examinee's  original  responses  without  replacement.  Then  each 
response  was  rescored  as  correct,  regardless  of  the  original  response. 

Note  that  omits  were  treated  as  any  other  response  category  and  rescored 
as  correct  if  selected.  Ten,  20,  and  30%  modifications  were  applied  to 
each  of  the  five  quintiles. 

The  k%  spuriously  low  modification  was  slightly  more  complex.  First, 
each  examinee's  response  vector  was  inspected  to  determine  the  proportion, 

£  ,  of  omitted  items.  Then  k%  of  the  examinee's  original  responses  were 
selected  randomly  and  without  replacement.  Each  item  was  rescored  as  an 
omit  with  probability  £  .  Options  A  through  E  were  selected  with  probability 
(l-£)/5  .  Note  that  this  procedure  reflects  the  examinee's  original  propensity 
to  omit  items.  Again,  10,  20,  and  30%  modifications  were  applied  to  the 
quintiles. 

After  tampering  with  the  response  vectors  in  a  quintile,  ability  was 
estimated  for  each  modified  response  vector  using  the  three  parameter 
logistic  model.  Then  and  Zg  were  computed  for  the  modified  response 
vectors  in  the  quintile. 

8.3.  ROC  Curves 

We  used  ROC  curves  to  display  the  effectiveness  of  an  index  for  detecting 
simulated  aberrance.  Here,  a  value  of  the  appropriateness  index,  say  _t  , 
is  specified.  Then  the  proportion  of  normal  and  aberrant  response  vectors 
with  index  values  less  than  t  are  determined.  Let 

x(t)  =  proportion  of  normal  examinees  with  index  values  <  t 
y(t)  =  proportion  of  aberrant  examinees  with  index  values  <_  t  . 

Plotting  the  <  x(t),  y(t)  >  pairs  for  several  values  of  t  produces  an 


Appropriateness  Measurement 
26 


ROC  curve.  An  ROC  curve  that  indicates  good  detection  of  aberrance  is 
one  that  rises  sharply  from  the  origin  to  the  upper  left  hand  corner  of 
the  plot.  A  random  classification  system  would  produce  an  ROC  curve  that 
lies  along  the  45  degree  diagonal  line.  To  conserve  space,  we  only  plot 
ROC  curves  for  low  false  alarm  rates:  x(t)  <  .20  .  An  elementary  descrip¬ 
tion  of  the  use  of  ROC  curves  in  appropriateness  measurement  is  given  by 
Hulin,  Drasgow,  and  Parsons  (1983). 


Results  for  the  Spuriously  Low  Modification 


Figure  6  presents  the  ROC  curves  for  the  spuriously  low  modification, 
with  panel  (A)  corresponding  to  the  z^  index  and  panel  (B)  corresponding 
to  z^  index.  The  30%  modification  is  indicated  by  circles,  the  20%  modifi¬ 
cation  by  squares,  and  the  10%  modification  by  a  solid  line.  The  45  degree 
diagonal  line  is  also  plotted. 


Insert  Figure  6  about  here 

The  panels  in  Figure  6  portray  an  orderly,  coherent  pattern  of  detect¬ 
ability.  In  each  case,  tampering  with  more  items  leads  to  greater  detect¬ 
ability.  This  is  indicated  by  the  ROC  curves  for  the  30%  modifications 
always  rising  more  sharply  than  the  other  two  curves,  and  the  20%  modification 
ROC  curves  rising  more  sharply  than  the  10%  curves. 

It  is  clear  that  detectability  increases  with  increasing  ability.  For 
example,  the  lowest  detection  rates  occur  for  the  first  quintile  where  exam¬ 
inees  had  estimated  ability  in  the  range  -2.05  to  -.80  prior  to  tampering. 

It  is  obvious  that  the  spuriously  low  treatment  would  have  relatively  little 


Appropriateness  Measurement 
27 


effect  on  the  responses  of  these  low  ability  examinees.  In  contrast, 
examinees  in  quintile  5  all  had  estimated  ability  in  the  range  .80  to  2.05 
prior  to  tampering.  Here  the  effects  of  the  spuriously  low  modification  on 
each  examinee's  response  vector  are  much  larger,  and  this  is  reflected  in 
very  high  detection  rates.  Note  that  detectability  increases  evenly  as  pre¬ 
tampering  ability  increases;  there  is  not  a  particular  ability  level  below 
which  appropriateness  measurement  is  completely  ineffective  and  above  which 
appropriateness  measurement  is  quite  effective. 

Despite  the  crude  estimates  of  the  histogram  model's  option  response  functions, 
it  is  clear  that  the  z^  index  is  substantially  superior  to  the  z. 3  index.  ROC 
curves  for  z^  generally  rise  more  sharply  than  the  corresponding  z^  ROC  curves, 
and  hence  provide  better  aberrance  detection. 

A  reason  that  the  histogram  model  affords  better  aberrance  detection 
is  straightforward:  aberrant  responses,  as  conceptualized  and  simulated 
here,  are  essentially  random.  Thus  easy  items  can  be  missed  and  some  extremely 
improbable  (P. .  less  than  .01)  incorrect  options  are  selected.  The  dichotomous 

*  J 

test  model  is  sensitive  to  incorrect  responses  to  easy  items,  but  is  insen¬ 
sitive  to  the  pattern  of  incorrect  option  selection.  In  contrast,  the  z^ 
index  is  affected  by  the  selection  of  an  incorrect  option. 


Results  for  the  Spuriouslj 


Modification 


Figure  7  presents  the  results  for  the  spuriously  high  modification.  Again, 
the  30,  20,  and  10%  modifications  are  indicated  by  circles,  squares,  and  soTid  lines 
respectively.  Clearly,  tampering  with  more  items  increases  detectability.  As 
expected,  detectability  decreases  with  increasing  ability  in  Figure  7.  Providing 
the  answer  key  for,  say,  20%  of  the  exam  to  bright  examinees  has  a  relatively 
small  impact  on  their  answers,  and  is  not  likely  to  be  detectable  by  appropriateness 


measurement  techniques.  In  contrast,  providing  low 


•  -V  ■ 


■  V->V.V '.V.V.V. 


nppi  upi  lo  ccucaa  icasuitiikiii. 


28 

ability  examinees  with  answers  to  20*  of  the  exam  will  have  substantial 
effects  on  their  responses.  As  seen  in  Figure  7,  this  type  of  spuriously 
high  score  is  detectable,  especially  by  the  z^  index. 


Insert  Figure  7  about  here 


Perhaps  the  most  interesting  result  obtained  from  the  spuriously  high 
modification  is  the  finding  that  the  z^  index  is  clearly  superior  to  the 
z^  index.  This  result  appears  counter-intuitive  because  the  histogram  model, 
which  provides  a  fuller  description  of  the  test-taking  behavior  of  normal 
examinees,  should  provide  more  power  in  detecting  departures  from  normal 
test-taking  behavior.  We  believe  that  the  superiority  of  the  dichotomous 
test  model  for  detecting  spuriously  high  examinees  is  chiefly  a  result  of 
the  particular  class  of  appropriateness  indices  under  consideration  (i.e., 
the  l0  class).  This  class  of  indices  is  subject  to  a  "swamping"  effect 
when  utilized  to  detect  spuriously  high  response  vectors  on  polychotomously 
scored  multiple  choice  tests.  Other,  more  sophisticated  indices  may  not  be 
affected  similarly. 

The  swamping  effect  is  perhaps  best  described  by  example.  Table  4 
presents  the  frequency  distribution  of  the  terms  that  compose  l0  and 
for  the  first  examinee  in  Quintile  2: 

lo,3(i)  =  “i  '°9Pj(ed)  +  O-u,)  ioga,(ed) 

and 

I  A)  A+l 

*  l  «j(v,Mo9£,j(ed) 

J  * 


'  .V-V sVvV'vWCsr" "wW v  V'".  A  •" 


respectively. 


Appropriateness  Measurement 
29 


i 

i 

< 

Note  that  prior  to  tampering  there  was  only  one  term  less  than  -2.0  for  the 

i 

three  parameter  logistic  model  but  there  were  17  such  terms  for  the  histogram 

model.  After  tampering,  there  were  5  terms  less  than  -2.0  for  the  dichotomous 

model  (an  increase  of  400%)  and  22  terms  for  the  histogram  model  (an  increase 

of  29%).  It  is  interesting  to  note  that  three  of  the  smallest  four 
terms  for  the  logistic  model  after  tampering  were  items  rescored  as  correct 

during  tampering.  In  contrast,  none  of  the  three  smallest  histogram  terms 

had  been  subjected  to  tampering,  and  only  two  of  the  smallest  11  terms  had 

been  affected  by  tampering. 


Insert  Table  4  about  here 


This  example  illustrates  the  swamping  effect.  A  normal  number  of  rel¬ 
atively  rare  incorrect  option  selections  and  mistakes  on  easy  items— as  noted 
above,  9  for  the  examinee  in  Table  4— camouflages  correct  answers  to  hard 
items  produced  by  the  spuriously  high  modification.  This  occurs  because 
the  probabilities  for  some  incorrect  options  are  very  nearly  zero  in  the 
histogram  model,  and  most  examinees  choose  a  few  of  these  improbable  incorrect 
options  during  the  85  item  SAT-V  exam.  Swamping  occurs  much  less  in  the  three 
parameter  logistic  model  because  the  model  does  not  differentiate  between  the 
various  incorrect  options. 


Appropriateness  Measurement 
30 


9.  Discussion 

From  the  research  presented  in  this  article  it  is  apparent  that  standard¬ 
ization  substantially  reduces  the  confounding  between  measured  appropriateness 
and  ability:  The  conditional  distributions  of  z^  and  are  more  nearly 
invariant  across  ability  levels  than  are  the  conditional  distributions  of 
lQ  and  .  Thus,  standardized  index  scores  for  examinees  of  different 

abilities  can  be  compared  more  easily  when  making  classification  decisions. 

It  must  be  emphasized  that  our  implementation  of  the  standardization 
concept  (i.e.  transforming  index  values  to  make  the  conditional  distribution 
of  an  appropriateness  index  independent  of  estimated  ability)  can  be  improved 
in  many  ways.  An  improved  estimate  of  the  conditional  distribution  of  an 
index  can  be  obtained,  if  not  analytically  then  by  simulation.  A  more  "robust" 
estimate  of  ability  can  be  obtained  by  reducing  the  relative  contribution  of 
improbable  responses  to  the  estimate  (Wainer  &  Wright,  1980;  Jones,  1932). 

Our  studies  show  that  standardization  is  needed  and  is  easily  implemented. 

In  pilot  studies  for  the  research  described  by  Levine  and  Rubin  (1979), 
it  was  found  that  £Q  provided  very  low  rates  of  detection  of  aberrant 
response  patterns  in  samples  of  examinees  with  unrestricted  omitting.  Levine 
and  Rubin  found  much  higher  detection  rates  when  samples  were  restricted  to 
low  rates  of  omitting.  To  handle  higher  rates  of  omitting  we  have  introduced 
polychotomous  models.  It  is  interesting  to  note,  however,  that  standardization 
of  the  dichotomous  model  appropriateness  index  l0  allows  high  detection  rates 
in  samples  with  only  weak  restrictions  on  omitting  rates.  These  detection 
rates  seem  to  be  nearly  as  high  as  detection  rates  for  jtQ  in  samples  with 
low  omitting  rates. 


Appropriateness  Measurement 
31 


% 

*4 


5 


>5  . 


■.  j* 

s 


w 

i 

!■> 


i 


,s 

V 

-■-  -  » . 


Improved  detection  of  response  patterns  modified  by  the  spuriously  low 
treatment  was  also  obtained  from  use  of  the  z^  index.  This  index  is  sensitive 
to  the  pattern  of  incorrect  option  selection  and  consequently  facilitates  ident¬ 
ification  of  examinees  who  choose  unusual  options  when  they  respond  incorrectly. 

The  index  was  not  as  effective  as  the  index  in  identifying  spur¬ 
iously  high  response  patterns.  Thus,  we  are  left  with  the  question  of  whether 
our  particular  choice  of  a  polychotomous  model  appropriateness  index  was  un¬ 
fortunate:  Would  a  different  polychotomous  model  appropriateness  index  provide 
much  higher  detection  rates?  In  our  current  research,  we  have  identified  an 
optimal  appropriateness  index  for  spuriously  high  response  patterns.  Our 
preliminary  results  indicate  that  detects  spuriously  high  patterns  at  a 
rate  much  closer  to  the  optimal  index  than  z^  ,  but  not  so  well  as  to  discourage 
refinements  of  z^  and  the  formulation  of  an  alternative  polychotomous  index. 

Omitted  items  are  ignored  when  computing  the  standardized  dichotomous 
appropriateness  index.  The  standardized  polychotomous  index,  in  contrast, 
treats  nonresponse  as  the  selection  of  the  (A+l)th  option  on  an  A  option 
multiple  choice  item.  This  "option"  is  then  treated  in  a  fashion  similar 
to  the  other  options  when  computing  z^  .  Examinees  who  omitted  a  large 
number  of  items  or  who  failed  to  reach  many  items  frequently  received  very 
low  z^  scores.  Because  it  seems  likely  that  these  examinees  would  receive 
higher  test  scores  (which  are  a  linear  function  of  number  correct  minus  one- 
fourth  of  number  incorrect)  if  they  answered  more  items,  it  appears  that  z^ 
has  identified  one  form  of  naturally  occuring  spuriously  low  test  score. 

The  very  low  appropriateness  scores  observed  among  nominally  normal 
examinees  with  high  nonresponse  rates  can  be  seen  as  casting  doubt  on  the 
uni  dimensional ,  local  independence  assumptions  of  the  histogram  model.  It 
seems  likely  that  there  are  substantial  individual  differences  between  exam- 


Appropriateness  Measurement 
32 


inees  in  rates  of  responding  and  willingness  to  guess  or  use  partial  infor¬ 
mation.  These  departures  from  uni  dimensionality,  though  obvious  in  retrospect, 
constitute  an  serendipitous  finding  of  considerable  practical  importance. 
Excessively  conservative  examinees  who  are  reluctant  to  use  partial  infor¬ 
mation,  examinees  who  perseverate  on  difficult  items  and  other  able,  low 
scoring  examinees  with  high  nonresponse  rates  indeed  do  have  inappropriately 
low  number  right  scores.  It  seems  desirable  to  identify  and  counsel  them. 

From  the  testing  organization's  point  of  view,  it  seems  wise  to  exclude  them 
from  item  parameter  estimation  samples  since  their  presence  may  introduce 
additional  sampling  error  (and  possibly  bias)  in  item  parameter  estimates. 


Appropriateness  Measurement 
33 


References 


Birnbaum,  A.  (1968)  Some  latent  trait  models  and  their  use  in  inferring 
an  examinee's  ability.  In  F.M.  Lord  &  M.R.  Novick,  Statistical  theories 
of  mental  test  scores.  Reading,  Mass.:  Addison-Wesley . 

Bock,  R.D.  (1972)  Estimating  item  parameter  and  latent  ability  when  responses 
are  scored  in  two  or  more  nominal  categories.  Psychometrika,  37,  29-51* 

Drasgow,  F.  (1982)  Choice  of  test  model  for  appropriateness  measurement. 

Applied  Psychological  Measurement,  6^,  297-308. 

Hulin,  C.L.,  Drasgow,  F. ,  &  Parsons,  C.K.  (1983)  Item  response  theory:  Appli¬ 
cation  to  psychological  measurement.  Homewood,  Ill.:  Dow  Jones -Irwin. 

Jones,  D.H.  (1982)  Tools  of  robustness  for  item  response  theory.  Research  Report 
82-41 .  Princeton,  N.J.:  Educational  Testing  Service. 

Levine,  M.V.  (1983)  The  trait  in  latent  trait  theory.  In  D.J.  Weiss  (Ed.), 
Proceedings  of  the  1982  Item  Response  Theory/ Comp uteri  zed  Adaptive  Testing 
Conference.  Minneapolis:  University  of  Minnesota,  Department  of  Psychology, 
Computerized  Adaptive  Testing  Laboratory. 

Levine,  M.V.  &  Drasgow,  F.  (1982)  Appropriateness  measurement:  Review,  critique 
and  validating  studies.  British  Journal  of  Mathematical  and  Statistical 
Psychology,  35,  42-56. 

Levine,  M.V.  &  Drasgow,  F.  (1983)  Appropriateness  measurement:  Validating  studies 
and  variable  ability  models.  In  D.J.  Weiss  (Ed.)  New  horizons  in  testing: 

Latent  trait  test  theory  and  computerized  adaptive  testing.  New  York: 

Academic  Press,  in  press,  (a) 

Levine,  M.V.  &  Drasgow,  F.  (1983)  The  relation  between  incorrect  option  choice 

and  estimated  ability.  Educational  and  Psychological  Measurement,  in  press,  (b) 


Appropriateness  Measurement 
34 


Levine,  M.V.  &  Rubin,  D.F.  (1979)  Measuring  the  appropriateness  of  multiple 
choice  test  scores.  Journal  of  Educational  Statistics,  4,  269-290. 

Lord,  F.M.  (1969)  Estimating  true-score  distributions  in  psychological  testing 
(An  empirical  Bayes  estimation  problem).  Psychometrika,  34,  259-299. 

Lord,  F.M.  (1970)  Item  characteristic  curves  as  estimated  without  knowledge 
of  their  mathematical  form  -  A  confrontation  of  Birnbaum's  logistic  model. 
Psychometrika ,  35,  43-50. 

Lord,  F.M.  (1974)  Estimation  of  latent  ability  and  item  parameters  when  there 
are  omitted  responses.  Psychometrika ,  39,  247-264. 

Samejima,  F.  (1981)  Final  report:  Efficient  methods  of  estimating  the  operating 
characteristics  of  item  response  categories  and  challenge  to  a  new  model 
for  the  multi  pie- choice  item.  Technical  Report.  Knoxville,  Tenn. :  Depart¬ 
ment  of  Psychology,  University  of  Tennessee. 

Wainer,  H.  &  Wright,  B.D.  (1980)  Robust  estimation  of  ability  in  the  Rasch  model. 
Psychometrika ,  45,  373-391. 

Wood,  R.L.  &  Lord,  F.M.  (1976)  A  user's  guide  to  LOGIST.  Research  Memorandum 
76-4.  Princeton,  N.J.:  Educational  Testing  Service. 

Wood,  R.L.,  Wingersky,  M.S.,  &  Lord,  F.M.  (1976)  LOGIST  -  A  computer  program 
for  estimating  examinee  ability  and  item  characteristic  curve  parameters. 
Research  Memorandum  76-6.  Princeton,  N.J.:  Educational  Testing  Service. 


Appropriateness  Measurement 
3b 


Acknowledgements 

This  work  was  supported  by  ONR  contracts  N00014-79C-0752,  NR  154  445 
and  N00014-83K-0397,  NR  150  518.  We  are  grateful  to  the  College  Entrance 
Examination  Board  for  providing  access  to  the  Scholastic  Aptitude  Test  data. 


Appropriateness  Measurement 
36 


Table  1 

Cumulative  Proportions  of  Appropriateness  Index 
Scores  at  Various  Cutting  Scores  for  3478  SAT-V  Examinees 


Index 


Cutting  Normal 
Score  Curve 


Ability  Interval* 

Low  Mod.  Low  Ave.  Mod.  High  High 


Cutting  Normal 


Ability  Interval 


*The  ability  intervals  are:  low  =  [-2.05,  -0.80],  moderately  low  =  (-0.80,  -0.24], 
average  =  (-0.24,  0.24],  moderately  high  =  (0.24,  0.80],  and  high  =  (0.80,  2.05] 


38 


Table  3 

Cumulative  Proportions  of  Appropriateness  Index 


Index 

Score 

Curve 

Low 

Mod.  Low 

Ave . 

Mod.  High 

High 

—3 

-2.58 

.005 

.005 

.005 

.005 

.003 

.001 

-1.96 

.025 

.024 

.022 

.021 

.014 

.017 

-1.64 

.050 

.038 

.046 

.042 

.032 

.032 

-1.30 

.097 

.062 

.097 

.080 

.073 

.061 

Total  N 

in  Ability 

Interval 

742 

794 

765 

752 

847 

h 

-2.58 

.005 

.004 

.000 

.003 

.001 

.000 

-1.96 

.025 

.011 

.008 

.011 

.008 

.005 

-1.64 

.050 

.034 

.020 

.028 

.020 

.027 

-1.30 

.097 

.065 

.049 

.064 

.050 

.073 

Total 

in  Ability 

Interval 

759 

754 

793 

747 

841 

Note:  Simulated  examinees  were  sorted  in  to  ability  intervals  on  the  basis  of  9  . 
The  ability  intervals  are:  low  =  [-2.05,  -0.80],  moderately  low  =  (-0.80,  -0.24], 


average  =  (-0.24,  0.24],  moderately  high  =  (0.24,  0.80],  and  high  =  (0.80,  2.05]. 


* 


Appropriateness  Measurement 
39 


Table  4 


Frequency  Distribution  of  Terms 

for  the  First  Examinee  in  Quintile  2 


Interval 

Three  Parameter 

Logistic  Model 

Histogram 

Model 

Distribution  for  Original  Responses 

o 

o 

o 

1 _ 1 

74 

37 

(-2.0,  -1.0] 

9 

31 

(-3.0,  -2.0] 

1 

15 

i 

-F» 

O 

* 

1 

u> 

o 

l _ t 

0 

2 

Omit 

1 

- 

z3=  !.22 

z^  =  0.57 

Distribution  After  20%  Spuriously  High 


Modifi  cation 


Appropriateness  Measurement 
40 


1.  Three  parameter  logistic  £0  plotted  against  6^  for  150  nominally 
normal  examinees. 

2.  Histogram  £0^  plotted  against  §d  for  150  nominally  normal  examinees. 

3.  Standardized  three  parameter  logistic  appropriateness  index  z^  plotted 

A 

against  6^  for  250  nominally  normal  examinees. 

/V 

4.  Standardized  histogram  appropriateness  index  z^  plotted  against  0d 
for  250  nominally  normal  examinees. 

5.  Cumulative  proportions  for  the  standard  normal  distribution  and  the 
standardized  z ^  and  appropriateness  indices. 

6.  ROC  curves  for  the  spuriously  low  manipulations.  Panel  (A)  presents 
results  for  the  z 3  appropriateness  index  and  panel  (B)  presents  results 
for  the  z^  appropriateness  index. 

7.  ROC  curves  for  the  spuriously  high  manipulations.  Panel  (A)  presents 
results  for  the  z^  appropriateness  index  and  panel  (B)  presents  results 
for  the  z^  appropriateness  index. 


Cumulative  Proportion 


Proportion  of  Spuriously  Low 
Response  Vectors  Detected 


O  O.l  0.2  0  0.1  0.2  0  0.1  0.2  0  0.1  Q2  0  0.1  0.2 
Proportion  of  Normal  Response  Vectors  Misclassified 

(a) 


0  0.1  0.2  0  0.1  0.2  0  0.1  0.2  0  0.1  0.2  0  0.1  0.2 


Proportion  of  Normal  Response  Vectors  Misclassified 


(b) 


P-1050 


Illinois/Levine 


9-Hmr-B4 


Page  1 


Navy 

1  Dr.  Ed  Aiken 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1  Dr.  Nick  Bond 
OH  ice  o<  Naval  Research 
Liaison  DHice,  Far  East 
APO  San  Francisco.  CA  96501 

1  Lt.  Alexander  Bory 
Applied  Psychology 
Measurement  Division 
NAHRL 

NA5  Pensacola.  Ft  3250S 

1  Dr.  Robert  Carroll 
NAVOP  115 

Washington  ,  DC  20370 

1  Dr.  Stanley  Col  Iyer 
OFf ice  o*  Naval  Technology 
800  N.  Quincy  Street 
Arlington.  VA  22217 

1  CDF:  Hike  Curran 
OHice  o*  Naval  Research 
BOO  N.  Suincy  St. 

Code  270 

Arlington,  VA  22217 

1  Dr.  John  Ellis 
Navy  Personnel  RAD  Center 
San  Diego.  CA  92252 

1  DR.  PAT  FEDERICO 
Code  P13 
NPRDC 

San  Diego,  CA  92152 

1  Dr.  Cathv  Fernandes 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1  Dr.  Norma*  i.  Kerr 
Chief  of  Naval  Technical  Training 
Naval  Air  Station  Neeohis  (751 
Millington,  TN  3B05* 

1  Dr.  Leonard  Kroeker 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 


Navy 

1  Dr,  Nilliai  L.  Malay  (02! 

Chief  of  Nava!  Education  and  Training 
Naval  Air  Station 
Pensacola,  FL  3250E 

1  Dr.  Kneale  Marshall 
Chairaan,  Operations  Researcn  Deot. 
Naval  Post  Braduate  School 
Monterey,  CA  93940 

1  Dr,  Jaees  McBride 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1  Cdr  Ralph  McCuater 
Director,  Researcn  A  Analysis  Division 
Navy  Recruiting  Command 
♦015  Wilson  Boulevard 
Arlington,  VA  22203 

1  Dr.  George  Moeller 
Director,  Behavioral  Sciences  Dept. 
Naval  Submarine  Medical  Research  Lab 
Naval  Submarine  Base 
Broton,  CT  06349 

1  Dr  William  Montague 
NPRDC  Code  13 
San  Diego,  CA  92152 

1  Library,  Code  P201L 
Navy  Personnel  RAD  Cente* 

San  Diego,  CA  92152 

1  Technical  Director 
Navy  Personnel  RID  Center 
San  Diego,  CA  92152 

6  Commanding  Officer 
Naval  Research  Laboratory 
Code  2d27 

Washington.  DC  20390 

6  Personnel  A  Training  Research  Group 
Code  442PT 

Office  of  Naval  Researcn 
Arlington,  VA  22217 

1  LT  Fran*  C.  Petho,  HSC,  USN  (Pn.D) 

CNET  (N-432’ 

NAS 

Pensacola,  FL  3250B 


Ill inoi s/Lcvinc 


9-IUr-B4 


Page  3 


Marine  Corps 

1  H.  Ml 111  as  Greer jp 
Education  Advisor  (E031) 

Education  Center,  tICDEC 
Uuantico,  VA  22134 

1  Jerry  Lehnus 
CAT  Project  OFFice 
HQ  Marine  Corps 
Washington  ,  DC  20380 

1  Director,  OFFice  oF  Manpowe'  Utilizatio 
HQ,  Harine  Corps  (MPUi 
BCE.  Bldg.  2009 
Quantico.  VA  22134 

1  Headquarters,  1).  S.  Marine  Corps 
Code  HP! -20 
Maslu  not  on.  DC  20380 

'  Special  Assistant  lor  Marine 
Corps  Matters 
Code  100'*. 

OFFice  or  Nava!  Research 
800  N.  Buincv  St. 

Arlington,  VA  22217 

1  DP.  A.L.  SLAFK0S1 V 
SCIENTIFIC  ADVIS0P  (CODE  RI- 1 ) 

HU.  U.S.  MARINE  CQRPS 
MASH1STCN.  DC  203BC 

!  Major  Fran!  Yohannan,  USMC 
Headquarters,  Harine  Corps 
(Cede  MPI-20) 

Washington,  DC  20380 


Any 

1  Technical  Director 
LI.  S.  Aray  Research  Institute  For  the 
Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Kent  Eaton 
Aray  Research  Institute 
500!  Eisenhower  Bivd. 

Alexandria  ,  VA  22333 

1  Dr.  Myron  Fischl 

U.S.  Aray  Research  Institute  For  the 
Social  and  Behavioral  Sciences 
5001  Eisenhower  Avenue 
Alexandria.  VA  22333 

1  Dr.  Hilton  S.  Katz 
Training  Technical  Area 
U.S.  Aray  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  2233! 

1  I,r.  [lessen  Martin 
Arty  Research  Institute 
5001  Eisenhower  Blvo. 

Alexandria,  VA  22133 

I  Dr.  Williat  E.  Nerdsroct 
FHC-ADCG  Box  25 
AFO,  NY  09710 

!  Mr,  Robert  Ross 

U.S.  Aray  Research  Institute  For  the 
Social  and  Behavioral  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Robert  Sasaor 
U,  S.  Aray  Research  Institute  For  the 
Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  2233! 

i  Dr.  Joyce  Shields 
Aray  Research  Institute  For  the 
Behavi oral  and  Social  Sciences 
500!  Eisenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Hilda  Ming 
Aray  Research  Institute 
5001  Eisenhower  Ave. 

Alexandria,  VA  22333 


II linoi s/Levine 


9-far-B* 


Page  4 


Air  Force 

1  Technical  Documents  Center 
Air  Force  Huaan  Resources  Laboratory 
WPAFB,  OH  45433 

1  U.S.  Air  Force  Office  of  Scientific 
Research 

Life  Sciences  Directorate,  ML 
Bolling  Air  Force  Base 
Wasnington,  DC  20332 

j  Ai'  University  Library 
AUL'LSE  76/443 
Man eel l  AFB,  AL  36112 

1  Dr.  Earl  A.  A’luisi 
HE.  AFHRL  <AFSC' 

Brooks  AFE.  TJ  78235 

1  Hr.  Rayaond  E.  Christal 
AFHRL /HOE 

Brooks  AFB,  TJ  78235 

1  Dr.  Alfred  f:.  Freglv 
AFOSR/Nl 

Bolling  AFE.  DC  20332 

1  Dr.  Patrick  Kyi  loner. 

AFHRL/HOE 

Brooks  AFB.  II  7B235 

1  Dr.  Roger  Pennell 
Air  Force  Huaan  Resources  Laboratory 
Loury  AFB,  CO  B0230 

1  Dr.  Hal  col  a  Ree 
AFHSL/HP 

Brooks  AFB,  TJ  ’8235 


Departrent  of  Defense 

12  Defense  Technical  Inforaation  Center 
Caaeron  Station,  Bldg  5 
Alexandria,  VA  22314 
Attn:  TC 

1  Military  Assistant  for  Training  and 
Personnel  Technology 
Office  of  the  Under  Secretary  of  Defer, 
for  Research  A  Engineering 
Rooa  3E129,  The  Pentagon 
Washington,  DC  20301 

1  Dr.  H.  Steve  Se Ilian 
Office  of  the  Assistant  Secretary 
of  Defense  (HftA  l  L) 

2B269  The  Pentagon 
Washington.  DC  20301 

1  Hajor  Jack  Thorpe 
DARPA 

1400  Hi! son  91 vd. 

Arlington,  VA  22209 

1  Dr.  Robert  A.  Wisher 
QUSDRE  (ELS! 

The  Pentagon,  Rooa  3D12C 
Washington,  DC  20301 


IUinois/levine 


9Htar-84 


Page 


Civilian  Agencies 

1  0f.  Verr  I.  Urry 
Personnel  RAD  Center 
Office  of  Personnel  Management 
1900  E  Street  MM 
Washington,  DC  20415 

1  Mr.  Thoaas  A.  Mara 
U.  S.  Coast  6uard  Institute 
P.  0.  Substation  16 
Oklahoma  City,  OK  73169 

1  Dr.  Frank  Nithrou 
U.  S.  Office  of  Education 
400  Maryland  Ave.  SK 
Washington,  DC  202C2 

1  Dr.  Joseph  L.  Young,  Director 
Memory  l  Cognitive  Processes 
National  Science  Foundation 
Washington,  DC  20550 


m 

v 
Ry: 

M 

m 


$ 


Private  Sector 

1  Dr.  James  Algina 
University  of  Florida 
Bainesville,  FL  326 

1  Dr.  Erl ing  B  Andersen 
Department  of  Statistics 
Studiestraede  6 
1455  Ccoenhagen 
DENMARK 

1  1  Psychological  Research  Unit 
KBR-3-44  Attn 
Northeourne  House 
Turne'  ACT  2601 
AUSTRALIA 

1  Dr.  Alan  Baddeley 
Medical  Research  Council 
Applied  Psychology  Unit 
15  Chaucer  Road 
Cambridge  CB2  2EF 
ENGLAND 

1  Dr.  Isaac  Bejar 
Educational  Testing  Service 
Princeton,  NJ  06450 

1  Dr.  Henucha  Birenbau* 

School  of  Education 
Tel  Aviv  University 
Tel  Aviv,  Ramat  Aviv  69976 
Israel 

1  Dr.  R.  Darrell  Bock 
Department  of  Education 
University  of  Chicago 
Chicago,  IL  60637 

1  Dr.  Robert  Brennar. 

American  College  Testing  Programs 
P.  D.  Bos  166 
Iowa  City,  IA  52243 

1  Dr.  61enn  Bryan 
6206  Pee  Road 
Bethesda,  MD  20817 

1  Bundeini sterium  der  Verteidigung 
-Refer at  P  II  4- 
Psychological  Service 
Postfach  1326 
D-5300  Bonn  1 
F.  R.  of  6ermany 


Illmois/Levine 


?-Jlar-B4 


Page  6 


Private  Sector 

1  Or.  Ernest  R.  Cadotte 
307  Stokely 

University  of  Tennessee 
Knoxville,  TH  37916 

1  Dr.  John  B.  Carroll 
409  Elliott  Rd. 

Chapel  Hill.  NC  27314 

1  Dr.  Korean  Cl  Hi 
Dept,  of  Psychology 
Univ.  of  So.  California 
University  Pari 
Los  Angeles,  CA  900D7 

1  Dr.  Allan  H.  Collins 
Bolt  Beranel  1  Reman.  Inc. 
50  Houlton  Street 
Cambridge,  HA  0213E 


Private  Sector 

1  Dr.  Susan  Eibertson 
PSYCHOLOGY  DEPARTMENT 
UNIVERSITY  OF  KANSAS 
Laurence,  KS  66045 

1  ERIC  Facility-Acquisitions 
4833  Rugby  Avenue 
Bethesda,  HD  20014 

1  Dr.  Benjamin  A.  Fai r bank ,  Jr. 
HcFann-Sray  1  Associates,  Inc. 
5825  Callaghan 
Suite  225 

San  Antonio,  TI  7822G 

1  Dr.  Leonard  Feldt 
Lindquist  Center  far  fleasureent 
University  of  Iowa 
I ova  City,  IA  32242 


1  Dr.  Lynn  A.  Coope1" 

LRDC 

Universitv  of  Pittsfiurgh 
393®  P’Ha-a  Street 
Pittsburgn,  PA  13213 

1  Dr.  Hans  Croebag 
Education  Research  Center 
Univvsity  of  Leyden 
Boerhaaveiaan  2 
2334  EN  Leyden 
The  NETHERLANDS 

1  CTB/HcBrae-Hill  Library 
2500  Garden  Road 
Honterey,  CA  93940 

1  Dr.  Dattpradad  Divgi 
Syracuse  University 
Department  of  Psychology 
Syracuse,  KE  1321 0 

1  Dr.  He; -Ki  Dong 

■  Ball  Foundation 
Root  314,  Building  B 
80C  Roosevelt  Road 
Glen  Ellyn,  IL  60137 

1  Dr.  Frit:  Drasgow 
Department  of  Psychology 
University  of  Illinois 
603  E.  Daniel  St. 
Champaign,  IL  61820 


1  Dr.  Richard  L.  Ferguson 
The  American  College  Testing  Program 

.  P.0.  Box  16B 
I ova  City,  IA  52240 

1  Univ.  Prof.  Dr.  Gerhard  Fischer 
Liebiggasse  5/3 
A  1010  Vienna 
AUSTRIA 

1  Professor  Donald  Fitzgeraio 
University  of  Nev  England 
Armidale,  Nev  South  Kales  2351 
AUSTRALIA 

1  Dr.  Dexter  Fletcher 
University  of  Oregon 
Department  c{  Computer  Science 
Eugene,  OR  97403 

1  Dr.  John  R.  FrederikSEr 
Bolt  Beranel  L  Nevmar 
50  Houlton  Street 
Cambridge,  HA  C213S 

1  Dr.  Janice  Eifford 
Universitv  of  Hassachusetts 
School  of  Education 
Amherst,  HA  01002 


Ill inoi 5/Levine 


9-ttar-B4 


Page  7 


Private  Sector 

1  Dr.  Robert  Blaser 

Learning  Research  1  Development  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
PITTSBUR6H,  PA  15260 

1  Dr.  Bert  6reen 
Johns  Hopkins  University 
Departaent  of  Psychology 
Charles  fc  34th  Street 
Baltimore,  HD  21218 

1  DR.  JANES  G.  6REEN0 
LRDC 

UNIVERSITY  OF  PITTSBURGH 
3939  O’HARA  STREET 
PITTSBURGH,  PA  15213 

1  Dr.  Ron  Haabietor 
School  of  Education 
University  cf  Hassachusetts 
Aaherst,  HP  010«2 

1  Dr.  Delayn  Harnisch 
University  of  Illinois 
242b  Education 
Urbana,  IL  618C1 

1  Dr.  Paul  Horst 
677  6  Street,  1184 
Chula  Vista,  CA  90010 

1  Dr.  Lloyd  Humphreys 
Departaent  of  Psychology 
University  of  Illinois 
603  East  Daniel  Street 
Chaapaigr,  IL  61820 

1  Dr.  Steven  Hunka 
Departaent  of  Education 
University  of  Alberta 
Edmonton.  Alberta 
CANADA 


Private  Sector 

1  Dr.  Huynh  Huynh 
College  of  Education 
University  of  South  Carolina 
Coluabia,  SC  29208 

1  Dr.  Douglas  H.  Jones 
Advanced  Statistical  Technologies 
Corporation 
10  Trafalgar  Court 
Lawrenceville.  NJ  08148 

1  Professor  John  A.  Keats 
Departaent  of  Psychology 
The  University  of  Newcastle 
N.S.N.  2308 
AUSTRALIA 

1  Dr.  Scott  Kelso 
Haskins  Laboratories,  Inc 
270  Crown  Street 
New  Haven,  CT  065 1C 

1  CDR  Robert  S  Kennedy 
Canyon  Research  SrDup 
1040  Hcobcock  Road 
Suite  227 
Orlando,  FL  32803 

1  Dr.  Hilliaa  Koch 
University  of  Texas-Austin 
Hea5ureaent  and  Evaluation  Center 
Austin,  TX  7B703 

1  Dr.  Stephen  Kosslyn 
1236  Hilliaa  Jaaes  Hall 
33  Kirkland  St. 

Cambridge,  HA  02138 

1  Dr.  Alar  Lesccld 
Learning  RID  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh,  PA  15260 


1  Dr.  Earl  Hunt 
Dept,  of  Psychology 
University  of  Mashington 
Seattle,  NA  98105 

1  Dr.  Jack  Hunter 
2122  Coolidge  St. 
Lansing,  H!  48906 


1  Dr.  Hichael  Levine 
Departaent  of  Educational  Psychology 
210  Education  Bldg. 

University  of  Illinois 
Chaapaign,  IL  6180! 


.  .  -  -  ■  - 


Illinois/Levine 


9-Har-B4 


Page 


Private  Sector 

1  Dr.  Charles  Levis 
Faculteit  So=i ale  Netenschapper, 

Wivksuni versiteit  6roninger, 

Dude  Boteri ngestraat  23 
9712GC  Sroriinger 
Netherlands 

3  Lr.  Robert  Unr. 

College  of  Education 
University  of  Illinois 
Urbana,  1L  61801 

1  Nr.  Phillip  Livingston 
Systeas  and  Applied  Sciences  Corporate 
6ei:  Kenilworth  Avenue  ' 

Riverdale,  HD  20840 

3  Dr.  Robert  Lockean 
Center  for  Naval  Analysis 
200  North  Beauregard  St. 

Alexandria,  VA  22311 

1  0r.  Frederic  H.  Lord 
Educational  Testing  Service 
Princeton,  NJ  0E541 

1  Dr.  Jaies  Luasden 
Departaent  of  Psychology 
University  of  Nestern  Australia 
Nedlands  N.A.  6009 
AUSTRALIA 

1  Dr.  Dor  Lyon 
F.  C.  Box  *4 
Hi g ley  ,  A2  B5236 

1  Dr.  Sary  Harcc 
Stop  31-E 

Educational  Testing  Service 
Princeton,  NJ  08453 

1  Dr.  Scott  Maxwell 
Departaent  of  Psychology 
University  cf  Notre  Daie 
Notre  Date,  IN  46556 

1  Dr.  Saauel  T.  Hayo 
Loyola  University  of  Chicagc 
820  North  Michigan  Avenue 
Chicago,  II  60611 


Private  Sector 

1  Hr.  Robert  HcKinlev 
Aaerican  College  Testing  Progress 
P.0.  Be*  168 
Iowa  City,  1A  52243 

1  Dr.  Barbara  Means 
Huaan  Resources  Research  Orgar.irati 
300  North  Nashingtcn 
Alexandria,  VA  22314 

1 

Professor  Jason  Hi  11  ear. 

Departaent  of  Education 
Store  Hall 
Cornell  University 
Ithaca,  NY  1485! 

1  Dr.  Alien  Hunrc 

Behavioral  Technology  Laboratories 
1B45  Elena  Ave.,  Fourth  Floor 
Redondc  Beach,  CA  902"’7 

1  Dr.  N.  Alan  Nicewarae’- 
University  of  Oklahoma 
Departaent  cf  Psychology 
Oklahoaa  City,  Or.'  73069 

1  Dr.  Donald  A  Ncrian 
Cognitive  Science,  C-015 
Univ.  Df  California,  Sar,  Eiegc 
La  Jolla,  CF  9209! 

1  Dr.  Helvin  R  Novick 
356  Lindquist  Center  for  Heasuraent 
University  of  Iowa 
Iowa  City,  I A  52242 

I  Dr.  Jaaes  Olson 
NICAT,  Inc. 

1B75  Scuth  State  Street 
Orel,  UT  84057 

1  Dr.  Jesse  Orlarsky 
Institute  for  Defense  Analyses 
1801  N.  Beauregard  St. 

Alexandria,  VA  2233  1 

1  Hayne  H.  Patience 
Aaerican  Council  on  Education 
6ED  Testing  Service,  Suite  20 
One  Dupont  Cirle,  NH 
Washington ,  DC  20036 


Illinois/Levine 


9-Har-B4 


Page  9 


Private  Sector 

1  Or.  Jaaes  A.  Paulson 
Portland  State  University 
P.C.  Box  751 
Portland,  OR  97207 

1  Dr.  Jaaes  N.  Pellegrino 
University  of  California, 

Santa  Barbara 
Dept,  of  Psychology 
Santa  Barabara  ,  CA  92106 

1  Dr.  Ni-l  D.  Reckase 
ACT 

F.  C.  Box  168 
Iowa  City,  !A  52243 

1  Dr.  Thoaas  Reynolds 
University  cf  Te>.as-Dallas 
Marketing  Department 
P.  0.  Box  638 
Richardson,  II  75030 

1  D-.  Andrew  N,  Rose 
Aaeruar,  Institutes  for  Research 
1055  Thoaas  Jefferson  St.  NM 
Kashi ngton,  DC  200C7 

1  D-.  Ernst  Z.  Rothkop* 

Bel’.  Laboratories 
Hurray  Hill,  NJ  07974 

1  Dr.  Lawrence  Rudner 
403  El  a  Avenue 
Takoaa  Part,  HD  20012 

i  D'.  J.  Avar. 

Departaer.t  of  Education 
University  of  South  Carolina 
Coiuatia.  SC  29208 

1  Frank  L.  Scheldt 
Departaert  cf  Psychology 
Bldg.  6S 

Eeorge  Hashington  University 
Hash; ngtcr,  DC  20C52 

1  D'.  Kalte'  Schneider 
Psychology  Departaert 
607  E.  Daniel 
Chaapaign,  !L  61820 


Private  Sector 

1  Lowell  Schoer 
Psychological  6  fluantitative 
Foundations 
College  of  Education 
University  of  loaa 
Iowa  City,  IA  52242 

1  Dr.  kazuo  Shigeaasu 
7-9-24  Kugenuaa-Kaigar 
Fujusawa  25! 

JAPAN 

1  Dr.  Edwin  Shi r key 
Departaent  of  Psychology 
University  of  Central  Florida 
Orlando,  FL  32826 

1  Dr.  Killiaa  Sies 
Center  for  Naval  Analysis 
200  Korth.  Beauregard  Street 
Alexandria,  VA  22311 

1  Dr.  Robert  Sternberg 
Dept,  of  Psychology 
Yale  University 
Box  1IA,  Yale  Station 
New  Haven,  CT  06520 

1  Hartha  Stocking 
Educational  Testing  Service 
Princeton,  NJ  03541 

1  Dr.  Peter  Stolof4 
Center  for  Naval  Analysis 
200  North  Beauregard  5treet 
Alexandria,  VA  2232! 

1  David  E.  Stone,  Ph.B. 

Hazeltme  Corporation 
7680  Old  Springhouse  Road 
Hclean,  VA  22102 

1  Dr.  Nilliai  Scut 
University  of  Illinois 
Departient  of  Natheeatics 
Urbana,  1L  61801 

1  DR.  PATRICK  SUPPES 
INSTITUTE  FDR  NATHENATICAL  STUDIES  IN 
THE  SOCIAL  SCIENCES 
STANFORD  UNIVERSITY 
STANFORD,  CA  94305 


I  Hindis /Levine 


9-Bar -84 


Page  10 


Private  Sector 

1  Sr.  Han  bar  an  Snaeinathan 
Laboratory  of  Psychooetnc  and 
Evaluation  Research 
School  of  Education 
University  of  Hassachusetts 
Aeherst,  HA  01003 

1  Sr.  Kikumi  Tatsuoka 
Coaputer  Based  Education  Research  Lab 
252  Engineering  Research  Laboratory 
Urbana,  !L  61801 

1  Sr.  Haunce  Tatsuoka 
220  Education  Bldg 
1310  S.  Sixth  St. 

Champaign,  1L  61B20 

1  Dr.  David  Thissen 
Separtaent  of  Psychology 
University  of  Kansas 
Lawrence,  KS  66044 

1  Dr.  Douglas  Tonne 
Uni*,  of  So.  California 
Behavioral  Technology  labs 
lB<-5  S.  Elena  Ave. 

Redondo  Beach,  CA  90277 

1  Dr.  Robert  Tsutakawa 
Department  of  Statistics 
University  of  Hissouri 
Columbia,  HD  65201 

1  Dr.  V.  R.  R.  Uppuluri 
Union  Carbide  Corporation 
Nuclear  Division 
P.  0.  Box  Y 
Oak  Ridge,  TN  3783C 

1  Dr.  David  Vale 
Assessment  Systems  Corporation 
2233  University  Avenue 
Suite  310 

St.  Paul,  BN  55114 

1  Dr,  Kurt  Van  Lehn 
Xerox  PARC 

3333  Coyote  Hill  Road 
Palo  Alto,  CA  94304 

1  Dr.  Howard  Nainer 
Division  of  Psychological  Studies 
Educational  Testing  Service 
Princeton,  NJ  08540 


Private  Sector 

1  Dr.  Bichael  T.  Nailer 
Departient  of  Educational  Psychology 
University  of  lisconsin—  Milwaukee 
Milwaukee,  NI  53201 

1  Dr.  Brian  Haters 
HuaRRD 

300  North  Nashington 
Alexandria,  VA  22314 

1  Dr.  David  J.  Neiss 
N66C  Elliott  Hall 
University  of  Binnesota 
75  E.  River  Road 
Hinneapolis,  BN  55455 

1  Dr.  Donald  C.  Heitcman 
Hitre  Corporation 
1820  Dclley  Had; son  Blvd 
McLean,  VA  22102 

1  Dr.  Christophe1-  Nicker.s 
Department  of  Psychology 
University  of  Illinois 
Champaign,  IL  61820 

1  Dr.  Rand  R.  Nil  cox 
University  of  Southern  California 
Department  of  Psychology 
Los  Angeles,  CP  90007 

1  6er«an  Bilitary  Representative 
ATTN:  Nolfgang  Mi  1 degrube 
Streitkraefteaet 
D-530C  Bonn  2 

4000  Brandywine  Street,  NN 
Nashington  ,  DC  20016 

1  Dr.  Bruce  Hilliams 
Department  of  Educational  Psychology 
University  of  Illinois 
Urbana,  IL  61801 

1  Dr,  Nendy  Yen 
CTB/Hc6raw  Hill 
Del  Honte  Research  Park 
Monterey,  CA  93940 


* i'  v. 


